AIExpermentLab Is Live And I Am Documenting Every Mistake
I opened a new repository today. It is called AIExpermentLab. The name is intentional. The typo is also intentional. This is a space for weird ideas. This is a space for public failure. This is a space for benchmarks that document what does not work as clearly as what does.
Well, here we are again. The quote is from GLaDOS. The sentiment is mine. I am building a lab for experiments I cannot justify keeping private.
Sorry in advance if this is out of date!
The Philosophy
AIExpermentLab is the public successor to my private MyTrainer codebase. The old codebase grew into a tangle of half-finished experiments behind a closed door. The new repository is the antidote. Every commit is public. Every checkpoint goes to HuggingFace. Every benchmark result is reproducible from the configs in the repo.
The workflow is simple. Start with a vanilla LLaMA-style transformer at the smallest reasonable scale. Train it. Benchmark it. Publish the checkpoint. Pick one experimental component from the backlog. Graft it on. Retrain. Rebenchmark. Keep the feature if it helps. Document why if it does not. Repeat until the model is too big to call tiny anymore.
baseline = train_vanilla_transformer()
for feature in backlog:
model = graft_feature(baseline, feature)
metrics = benchmark(model)
if metrics.improved:
keep(feature)
else:
log_why_it_failed(feature)
baseline = model
# Measure twice. Cut once. Log everything.
The Target Series
The repository defines three size classes to prevent scope creep. Glint targets under one million parameters. Shard targets around fifty million. Prism targets around one hundred million. Each series unlocks only after the previous one stabilizes. This is discipline. This is also self-preservation.
The published model lineage on HuggingFace will continue under these names. Glint 1 already exists. The next Glint variants will emerge from this lab. The process will be visible. The mistakes will be visible. The progress will be visible.
The Feature Backlog
The repository queues dozens of experiments pulled from the old MyTrainer codebase. Each one will land behind an opt-in flag so we can run honest A/B tests. The backlog includes architecture experiments like recurrent depth cores, stable recurrent injection, Depth LoRA, adaptive halting, SleepGate, TRIM-KV retention gates, Engram memory blocks, manifold hyper-connections, COCONUT-style latent thinking, multi-token prediction heads, and per-layer embeddings.
It also includes training technique experiments like knowledge distillation, SPIN self-play fine-tuning, DPO and related preference optimization methods, GRPO reinforcement learning, progressive parameter grouping, anti-pattern unlikelihood loss, gradient-aware dynamic weighting, curriculum learning, model averaging, Muon and Crowfeather optimizers, various learning rate schedules, FIM augmentation, decontamination passes, online hard example mining, looping regularization, input token dropout, contrastive context loss, and crash recovery checkpoints.
Tokenizer experiments are queued too. Character-level tokenization with dynamic vocab extension. ByteLevel BPE with a two thousand token default. Metaspace BPE with a five hundred token default for dense signal in tiny models.
The backlog is long. The discipline is short. I will tackle one feature at a time. I will document each attempt. I will accept that most ideas will not help. That is the point of a lab.
The Progress Log
The repository includes a running journal called the Progress Log. Day zero marks the bootstrap. The repository was created. The old MyTrainer codebase was audited. The full backlog was extracted and documented. This README was written. The next step is to stand up the vanilla LLaMA-style baseline, publish the first checkpoint, and run the first benchmark.
Future entries will follow a strict format. Branch and commit hash. Series name. Features changed. Compute hours. Metric deltas versus previous. Verdict on whether to keep or drop the feature. Prose notes explaining the outcome. This structure forces accountability. It also creates a searchable archive of what worked and what did not.
Why Open Source
The MyTrainer codebase grew into a tangle of half-finished experiments behind a closed door. AIExpermentLab is the antidote. Every commit is public. Every checkpoint goes to HuggingFace. Every benchmark result is reproducible from the configs in this repo.
If you want to fork, ablate, or argue about a specific experiment, open an issue or a pull request. If you want to follow along, watch the Progress Log. If you want to contribute compute, support the project on KoFi. All tiers grant early access to models and datasets. That is the rule. That is the promise. That is the perk.
Tier 1: Early access to models and datasets. Water.
Tier 2: Everything above plus exclusive content, direct messages, and priority testing.
Tier 3: Everything above plus social media shout-outs, Discord access, and exclusive requests for dataset creation.
Final Thoughts
AIExpermentLab is live. The repository is public. The backlog is documented. The progress log is ready. The first baseline training begins soon. I will log every step. I will publish every checkpoint. I will accept every outcome.
If you want to watch the process, clone the repo. If you want to contribute ideas, open an issue. If you want to support the compute, visit KoFi. If you want to learn from the failures, read the Progress Log. All paths lead to progress. Some paths are just messier than others.
Built in public. Logged in public. Donations accepted. Mistakes guaranteed. Progress is weird. Transparency is non-negotiable.